AITopics | Chiang Mai

Collaborating Authors

Chiang Mai

f88709551258331f9ab31b33c71021a4-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-12-2026, 22:33:22 GMT

computational linguistic, dataset, keyphrase, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Quebec > Montreal (0.05)
Asia > China > Heilongjiang Province > Daqing (0.04)
(16 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates

Anancharoenkij, Thatchanon, Ponnoprat, Donlapark

arXiv.org Machine LearningFeb-5-2026

A complete understanding of heterogeneous treatment effects involves characterizing the full conditional distribution of potential outcomes. To this end, we propose the Conditional Counterfactual Mean Embeddings (CCME), a framework that embeds conditional distributions of counterfactual outcomes into a reproducing kernel Hilbert space (RKHS). Under this framework, we develop a two-stage meta-estimator for CCME that accommodates any RKHS-valued regression in each stage. Based on this meta-estimator, we develop three practical CCME estimators: (1) Ridge Regression estimator, (2) Deep Feature estimator that parameterizes the feature map by a neural network, and (3) Neural-Kernel estimator that performs RKHS-valued regression, with the coefficients parameterized by a neural network. We provide finite-sample convergence rates for all estimators, establishing that they possess the double robustness property. Our experiments demonstrate that our estimators accurately recover distributional features including multimodal structure of conditional counterfactual distributions.

artificial intelligence, estimator, machine learning, (14 more...)

arXiv.org Machine Learning

2602.04736

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(12 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

He Leaked the Secrets of a Southeast Asian Scam Compound. Then He Had to Get Out Alive

WIREDJan-27-2026, 11:00:00 GMT

A source trapped inside an industrial-scale scamming operation contacted me, determined to expose his captors' crimes--and then escape. It was a perfect June evening in New York when I received my first email from the source who would ask me to call him Red Bull. He was writing from hell, 8,000 miles away. A summer shower had left a rainbow over my Brooklyn neighborhood, and my two children were playing in a kiddie pool on the roof of our apartment building. Now the sun was setting, while I--in typical 21st-century parenting fashion, forgive me--compulsively scrolled through every app on my phone. The message had no subject line and came from an address on the encrypted email service Proton Mail: "vaultwhistle@proton.me." I'm currently working inside a major crypto romance scam operation based in the Golden Triangle," it began. "I am a computer engineer being forced to work here under a contract." "I've collected internal evidence of how the scam works--step by step," the message ...

compound, opération, red bull, (14 more...)

WIRED

Country:

North America > United States > New York (0.24)
Asia > Laos (0.05)
Asia > Southeast Asia (0.04)
(14 more...)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(3 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Entity Alignment with Noisy Annotations from Large Language Models

Neural Information Processing SystemsOct-9-2025, 20:04:31 GMT

However, it is nontrivial to directly apply LLMs for EA since the annotation space in real-world KGs is large. LLMs could also generate noisy labels that may mislead the alignment.

alignment, entity alignment, experiment, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.05)
North America > United States > Hawaii (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > Thailand > Chiang Mai > Chiang Mai (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

f88709551258331f9ab31b33c71021a4-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsAug-19-2025, 20:58:47 GMT

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Quebec > Montreal (0.05)
Asia > China > Heilongjiang Province > Daqing (0.04)
(16 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

Minimax Rates of Estimation for Optimal Transport Map between Infinite-Dimensional Spaces

Ponnoprat, Donlapark, Imaizumi, Masaaki

arXiv.org Machine LearningMay-28-2025

We investigate the estimation of an optimal transport map between probability measures on an infinite-dimensional space and reveal its minimax optimal rate. Optimal transport theory defines distances within a space of probability measures, utilizing an optimal transport map as its key component. Estimating the optimal transport map from samples finds several applications, such as simulating dynamics between probability measures and functional data analysis. However, some transport maps on infinite-dimensional spaces require exponential-order data for estimation, which undermines their applicability. In this paper, we investigate the estimation of an optimal transport map between infinite-dimensional spaces, focusing on optimal transport maps characterized by the notion of $γ$-smoothness. Consequently, we show that the order of the minimax risk is polynomial rate in the sample size even in the infinite-dimensional setup. We also develop an estimator whose estimation error matches the minimax optimal rate. With these results, we obtain a class of reasonably estimable optimal transport maps on infinite-dimensional spaces and a method for their estimation. Our experiments validate the theory and practical utility of our approach with application to functional data analysis.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Machine Learning

2505.1357

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Pacific Ocean (0.04)
North America > United States > New York (0.04)
(7 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Probing BERT for German Compound Semantics

Miletić, Filip, Schmid, Aaron, Walde, Sabine Schulte im

arXiv.org Artificial IntelligenceMay-21-2025

This paper investigates the extent to which pretrained German BERT encodes knowledge of noun compound semantics. We comprehensively vary combinations of target tokens, layers, and cased vs. uncased models, and evaluate them by predicting the compositionality of 868 gold standard compounds. Looking at representational patterns within the transformer architecture, we observe trends comparable to equivalent prior work on English, with compositionality information most easily recoverable in the early layers. However, our strongest results clearly lag behind those reported for English, suggesting an inherently more difficult task in German. This may be due to the higher productivity of compounding in German than in English and the associated increase in constituent-level ambiguity, including in our target compound set.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.1413

Country:

South America > Colombia > Meta Department > Villavicencio (0.05)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
(12 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database

Bian, Rong, Geng, Yu, Yang, Zijian, Cheng, Bing

arXiv.org Artificial IntelligenceMay-20-2025

A mathematical knowledge graph (KG) presents knowledge within the field of mathematics in a structured manner. Constructing a math KG using natural language is an essential but challenging task. There are two major limitations of existing works: first, they are constrained by corpus completeness, often discarding or manually supplementing incomplete knowledge; second, they typically fail to fully automate the integration of diverse knowledge sources. This paper proposes AutoMathKG, a high-quality, wide-coverage, and multi-dimensional math KG capable of automatic updates. AutoMathKG regards mathematics as a vast directed graph composed of Definition, Theorem, and Problem entities, with their reference relationships as edges. It integrates knowledge from ProofWiki, textbooks, arXiv papers, and TheoremQA, enhancing entities and relationships with large language models (LLMs) via in-context learning for data augmentation. To search for similar entities, MathVD, a vector database, is built through two designed embedding strategies using SBERT. To automatically update, two mechanisms are proposed. For knowledge completion mechanism, Math LLM is developed to interact with AutoMathKG, providing missing proofs or solutions. For knowledge fusion mechanism, MathVD is used to retrieve similar entities, and LLM is used to determine whether to merge with a candidate or add as a new entity. A wide range of experiments demonstrate the advanced performance and broad applicability of the AutoMathKG system, including superior reachability query results in MathVD compared to five baselines and robust mathematical reasoning capability in Math LLM.

information, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.13406

Country:

Europe > United Kingdom > North Sea > Southern North Sea (0.14)
Asia > China > Beijing > Beijing (0.04)
North America > United States > North Carolina > Wake County > Morrisville (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Empirical Study of the Impact of Federated Learning on Machine Learning Model Accuracy

Yang, Haotian, Wang, Zhuoran, Chou, Benson, Xu, Sophie, Wang, Hao, Wang, Jingxian, Zhang, Qizhen

arXiv.org Artificial IntelligenceMar-26-2025

Federated Learning (FL) enables distributed ML model training on private user data at the global scale. Despite the potential of FL demonstrated in many domains, an in-depth view of its impact on model accuracy remains unclear. In this paper, we investigate, systematically, how this learning paradigm can affect the accuracy of state-of-the-art ML models for a variety of ML tasks. We present an empirical study that involves various data types: text, image, audio, and video, and FL configuration knobs: data distribution, FL scale, client sampling, and local and global computations. Our experiments are conducted in a unified FL framework to achieve high fidelity, with substantial human efforts and resource investments. Based on the results, we perform a quantitative analysis of the impact of FL, and highlight challenging scenarios where applying FL degrades the accuracy of the model drastically and identify cases where the impact is negligible. The detailed and extensive findings can benefit practical deployments and future development of FL.

accuracy, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.20768

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Austria > Vienna (0.14)
Asia > Singapore (0.05)
(16 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Detecting and Mitigating DDoS Attacks with AI: A Survey

Apostu, Alexandru, Gheorghe, Silviu, Hîji, Andrei, Cleju, Nicolae, Pătraşcu, Andrei, Rusu, Cristian, Ionescu, Radu, Irofti, Paul

arXiv.org Artificial IntelligenceMar-22-2025

Distributed Denial of Service attacks represent an active cybersecurity research problem. Recent research shifted from static rule-based defenses towards AI-based detection and mitigation. This comprehensive survey covers several key topics. Preeminently, state-of-the-art AI detection methods are discussed. An in-depth taxonomy based on manual expert hierarchies and an AI-generated dendrogram are provided, thus settling DDoS categorization ambiguities. An important discussion on available datasets follows, covering data format options and their role in training AI detection methods together with adversarial training and examples augmentation. Beyond detection, AI based mitigation techniques are surveyed as well. Finally, multiple open research directions are proposed.

data mining, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2503.17867

Country:

Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Thailand > Chiang Mai > Chiang Mai (0.04)
(3 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.87)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(5 more...)

Add feedback